SeqFormer: Sequential Transformer for Video Instance Segmentation
نویسندگان
چکیده
In this work, we present SeqFormer for video instance segmentation. follows the principle of vision transformer that models relationships among frames. Nevertheless, observe a stand-alone query suffices capturing time sequence instances in video, but attention mechanisms shall be done with each frame independently. To achieve this, locates an and aggregates temporal information to learn powerful representation video-level instance, which is used predict mask sequences on dynamically. Instance tracking achieved naturally without branches or post-processing. On YouTube-VIS, achieves 47.4 AP ResNet-50 backbone 49.0 ResNet-101 bells whistles. Such achievement significantly exceeds previous state-of-the-art performance by 4.6 4.4, respectively. addition, integrated recently-proposed Swin transformer, much higher 59.3. We hope could strong baseline fosters future research segmentation, meantime, advances field more robust, accurate, neat model. The code available at https://github.com/wjf5203/SeqFormer .
منابع مشابه
MaskRNN: Instance Level Video Object Segmentation
Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance — a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent...
متن کاملSequential Monte Carlo video text segmentation
This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. After initialization the set of samples is recursively refined by random sampling under a temporal Bayesian framework. The proposed methodology allows us to esti...
متن کاملInstance Embedding Transfer to Unsupervised Video Object Segmentation
We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames, which a...
متن کاملSequential Instance-Based Learning
This paper presents and evaluates sequential instance-based learning (SIBL), an approach to action selection based upon data gleaned from prior problem solving experiences. SIBL learns to select actions based upon sequences of consecutive states. The algorithms rely primarily on sequential observations rather than a complete domain theory. We report the results of experiments on fixed-length an...
متن کاملShape-aware Instance Segmentation
We address the problem of instance-level semantic segmentation, which aims at jointly detecting, segmenting and classifying every individual object in an image. In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal. As a consequence, they cannot recover from errors in the object candidate ge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19815-1_32